A Real-Time Scene Text to Speech System

نویسندگان

  • Lukas Neumann
  • Jiri Matas
چکیده

 The system is based on an efficient end-to-end real-time scene text localization and recognition method [1,2,3]  Individual characters detected as Class-Specific Extremal Regions (CSERs) [4]  An efficient sequential classifier selects only ERs with locally maximal probability p(region|character) with complexity linear in the number of image pixels  The stability requirement of MSERs [5] is dropped; the detector has a lower memory footprint and handles better blurred, noisy and low-contrast text  A novel sequential classifier exploits more computationally expensive features without a negative impact on performance  Recognized text from subsequent frames is aggregated and sent to speech synthesizer EVALUATION 1: ICDAR 2011 DATASET

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cipher text only attack on speech time scrambling systems using correction of audio spectrogram

Recently permutation multimedia ciphers were broken in a chosen-plaintext scenario. That attack models a very resourceful adversary which may not always be the case. To show insecurity of these ciphers, we present a cipher-text only attack on speech permutation ciphers. We show inherent redundancies of speech can pave the path for a successful cipher-text only attack. To that end, regularities ...

متن کامل

MT3S: Mobile Turkish Scene Text-to-Speech System for the Visually Impaired

Reading text is one of the essential needs of the visually impaired people. We developed a mobile system that can read Turkish scene and book text, using a fast gradient-based multi-scale text detection algorithm for real-time operation and Tesseract OCR engine for character recognition. We evaluated the OCR accuracy and running time of our system on a new, publicly available mobile Turkish sce...

متن کامل

A 3d audio-visual animated agent for expressive conversational question answering

This paper reports on the ACQA (Animated agent for Conversational Question Answering) project conducted at LIMSI. The aim is to design an expressive animated conversational agent (ACA) for conducting research along two main lines: 1/ perceptual experiments (eg perception of expressivity and 3D movements in both audio and visual channels): 2/ design of human-computer interfaces requiring head mo...

متن کامل

Audio-visual Analysis of Multimedia Documents for Automatic Topic Identification

This paper presents a system that shall automatically scan multimedia data like TV or radio broadcasts for the presence of specific topics and, whenever topics of users’ interests are detected, alert the related user. Our current work on the three main modules of the system will be shown. (1) The speech recognition system (with 18.7 % WER) is already among the most advanced German broadcast spe...

متن کامل

Automatic topic identification in multimedia broadcast data

This paper presents a system that shall automatically scan multimedia data like TV or radio broadcasts for the presence of specific topics and, whenever topics of users’ interests are detected, alert the related user. Our current work on the three main modules of the system will be shown. (1) The speech recognition system (with 18.7 % WER) is already among the most advanced German broadcast spe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012